RESEARCH ARTICLE 



Advances in Cognitive Psychology 



Chunking or not chunking? 

How do we find words 

in artificial language learning? 



Ana Franco andArnaud Destrebecqz 



Cognition, Consciousness, and Computation Group, Universite Libre de Bruxelles, Belgium 



ABSTRACT 



KEYWORDS 

implicit statistical learning, 
transitional probabilities, 
chunking, serial reaction 
time task 



What is the nature of the representations acquired in implicit statistical learning? Recent results 
in the field of language learning have shown that adults and infants are able to find the words of 
an artificial language when exposed to a continuous auditory sequence consisting in a random 
ordering of these words. Such performance can only be based on processing the transitional pro- 
babilities between sequence elements. Two different kinds of mechanisms may account for these 
data: Participants may either parse the sequence into smaller chunks corresponding to the words 
of the artificial language, or they may become progressively sensitive to the actual values of the 
transitional probabilities between syllables. The two accounts are difficult to differentiate because 
they make similar predictions in comparable experimental settings. In this study, we present two 
experiments that aimed at contrasting these two theories. In these experiments, participants had 
to learn 2 sets of pseudo-linguistic regularities: Language 1 (L1) and Language 2 (L2) presented in 
the context of a serial reaction time task. L1 and L2 were either unrelated (none of the syllabic tran- 
sitions of L1 were present in L2), or partly related (some of the intra-words transitions of L1 were 
used as inter-words transitions of L2).The two accounts make opposite predictions in these two 
settings. Our results indicate that the nature of the representations depends on the learning con- 
dition. When cues were presented to facilitate parsing of the sequence, participants learned the 
words of the artificial language. However, when no cues were provided, performance was strongly 
influenced by the employed transitional probabilities. 



INTRODUCTION 

When faced with a complex structured domain, human learners tend 
to behave as if they extract the underlying rules of the material. In an 
artificial grammar learning experiment, for instance, participants are 
first requested to memorize a series of letter strings following the rules 
of a finite- state grammar. They are not informed of the existence of 
those rules, however. In a second phase of the experiment, when asked 
to classify novel strings as grammatical or not, they usually perform 
above chance level but remain generally unable to verbalize much of 
the rules. Such a dissociation has been initially attributed to the uncon- 
scious or implicit learning of the underlying rules (Reber, 1967, 1989). 

This interpretation has ever since been heavily debated, however. 
What is the exact nature of learning? Is performance based on learn- 



ing the abstract rules of the material or on the surface features of the 
training items, such as the frequencies of individual elements or chunks? 
Most recent implicit learning studies suggest that this latter view pro- 
vides a better account of performance(e.g., Perruchet & Pacteau, 1990). 
Several experiments have indeed demonstrated that performance was 
based on "fragmentary" learning. In other words, learning would de- 
pend on the memorization of fragments of the stimuli presented to the 
subjects instead of on an abstract rule-extraction process (Meulemans 
& Van der Linden, 1997; Perruchet & Amorim, 1992; Perruchet & 
Pacteau, 1990; E. Servan-Schreiber & Anderson, 1990). 
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Over the last few years, a series of experimental results have pro- 
vided new insights into the question of the nature of the representa- 
tions involved in implicit learning. Research on language acquisition 
has indeed shown that 8 -months old infants are sensitive to statistical 
information (Jusczyk, Luce, & Charles-Luce, 1994; Saffran, Aslin, & 
Newport, 1996; Saffran, Johnson, Aslin, & Newport, 1999) and ca- 
pable of learning distributional relationships between linguistic units 
presented in the continuous speech stream formed by an artificial lan- 
guage (Gomez & Gerken, 1999; Jusczyk, Houston, & Newsome, 1999; 
Perruchet & Desaulty, 2008; Saffran et al., 1996). The seminal studies 
by Saffran and collaborators have shown that infants, children, and 
adults were able to find the "words" from an artificial language when 
presented with a concatenation of those plurisyllabic sequences (e.g., 
batubi, dutaba) presented in a random order and forming a continuous 
stream without any phonological or prosodic markers. Only the tran- 
sitional probabilities (TPs) between syllables can be used to discover 
the word boundaries. Indeed, as the next word in the stream can never 
be anticipated, those TPs are stronger intra-word than between words. 

Other studies have indicated that these mechanisms are not re- 
stricted to linguistic material but also apply to auditory non-linguistic 
stimuli (e.g., Saffran et al., 1999) or to visual stimuli (e.g., Fiser & 
Aslin, 2002; Hunt & Aslin, 2001). In the same way, implicit sequence 
learning studies have indicated that human learners are good at de- 
tecting the statistical regularities present in a serial reaction time (SRT) 
task. Altogether, these data suggest that statistical learning depends 
on associative learning mechanisms picking up the inputs statistical 
constraints rather than on the existence of a "rule abstractor device" 
(Perruchet, Tyler, Galland, & Peereman, 2004). 

Different computational models have been proposed to account for 
the data. On the one hand, according to the simple recurrent network 
model (SRN; Elman, 1990; see also Cleeremans, 1993; Cleeremans, & 
McClelland, 1991), learning is based on the development of associa- 
tions between the temporal context in which the successive elements 
occur and their possible successors. Over training, the network learns 
to provide the best prediction of the next target in a given context, 
based on the transitional probabilities between the different sequence 
elements. On the other hand, models such as PARSER (Perruchet & 
Vinter, 1998), consider learning as an attention-based parsing process 
that results in the formation of distinctive, unitary, rigid representa- 
tions or chunks. In contrast with the SRN, PARSER finds and stores 
the most frequent sequences in memory files or mental lexicon. Thus, 
both models are based on processing statistical regularities, but only 
PARSER leads to the formation of "word-like" units. 

In a recent paper, Frank, Goldwater, Griffiths, and Tenenbaum 
(2010) classified the SRN and PARSER as examples of transition- 
finding or chunking models, respectively. The first model implements 
a bracketing strategy, according to which participants are assumed to 
insert boundaries into the sequence of speech. The second model im- 
plements a clustering: strategy that consists in grouping certain speech 
sequences together into units (Giroux & Rey, 2009; Swingley, 2005). 

Although the processes and representations assumed by 
these two classes of models are quite different, contrasting their 



assumptions is difficult as they make similar predictions in most 
experimental settings. For instance, in an artificial language learning 
experiment (including the pseudowords batubi and dutaba), as the 
representations that emerge in either model reflect the strength of the 
associations between sequence elements, both predicted improved 
processing of intra-words (e.g., ba-tu) than inter-words transitions 
(e.g., bi-du) as well as successful recognition of the words of the ar- 
tificial language (Saffran, Newport, Aslin, Tunick, & Barrueco, 1997). 
Besides, as noted by Perruchet and Pacton (2006), researchers in 
statistical learning tend to acknowledge the existence of chunk-like 
representations. Jenny Saffran (2001) showed, for instance, that pre- 
sented with an artificial language, 8-month infants develop word-like 
representations rather than merely probabilistically-related sequences 
of sounds. 

Some recent studies were conducted to distinguish the two models. 
In a recent experiment, Giroux and Rey (2009) compared lexical and 
sublexical recognition performance of adults after hearing 2 or 10 min 
of an artificial spoken language. A sublexical unit, or part-word, is a 
sequence of syllables composed of the end of a word and of the begin- 
ning of another word. They found that, as predicted by PARSER but 
not by the SRN, part- words recognition performance did not increase 
with longer exposure and that performance on words was better than 
performance on part-words only after 10 min. 

In another study, Endress and Mehler (2009) presented partici- 
pants with an artificial language containing three syllable-words. Each 
of these words was generated by modifying one syllable of what they 
called a phantom word that was never actually presented during the 
experiment. Endress and Mehler observed that after exposure partici- 
pants preferred words to part-words containing low-frequency tran- 
sitions but that they tended to consider phantom-words as words of 
the artificial language. They indeed failed to prefer words to phantom- 
words. This remained true even after arbitrarily long exposure phases. 
Importantly, participants also preferred phantom words to part- words 
even when these latter sequences were more frequently presented dur- 
ing the learning phase. The authors concluded that computing TPs is 
not sufficient for the extraction of word-like units and that other cues 
have to be processed for speech segmentation to occur (see Perruchet 
& Tillmann, 2010, for a recent discussion on that topic). 

Finally, a recent study in the visual domain (Orban, Fiser, Aslin, & 
Lengyel, 2008) provided further arguments in favor of the chunking 
hypothesis. In that study, participants learned scenes or assemblies of 
visual shapes statistically organized in pairs. They were then presented 
with two partial scenes and had to select the test scene more familiar 
based on the scenes viewed during familiarization. One test item was a 
combination of shapes from the training phase (or a part thereof, called 
an embedded combination) and the other test item consisted of shapes 
from two different pairs (a mixture combination). Orban et al. devised 
an experiment that contained two sets of four shapes in which both the 
first- (frequency of shapes) and second-order statistics between shapes 
(frequency of pairs) were made identical. However, the shapes in one of 
the sets were always shown as triplet combos, whereas the shapes in the 
other group were shown individually (and occasionally all four of them 
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were presented together). Results in the test phase indicated that parti- 
cipants formed chunks in the first but not in the second group of shapes. 
Indeed, they were able to recognize triplets from the first group of four 
against mixture triplets, they were also able to distinguish between tri- 
plets constructed from the elements of the two groups in a direct com- 
parison but they did not make the distinction between triplets construct- 
ed from the shapes of the second group of four with mixture triplets. 

Orban et al. (2008) compared human performance to the per- 
formance of two computational models: (a) an associative-learner (AL) 
that learns pair-wise correlations between shapes without an explicit 
notion of chunks and (b) a chunking model implementing bayesian 
learning processes (BCL). Consistent with human performance, the 
BCL successfully learned to distinguish between triplets constructed 
from the elements of the two groups of four shapes, whereas the AL 
was not able to make this distinction. As human participants, both the 
BCL and the AL correctly recognized triplets from the first group of 
four shapes against mixture triplets. Unlike human participants, the 
AL (but not the BCL) falsely recognized triplets constructed from the 
shapes of the second group of four against mixture triplets. 

To sum up, the available evidence in the auditory and visual 
domains suggests that chunking models provide a better account of 
human statistical learning abilities. These results are in contrast with 
the sequence learning literature. In that domain, several studies have 
shown that human performance could be accurately accounted for by 
the mere associative learning implement in the SRN (Cleeremans, 1993; 
Cleeremans & McClelland, 1991; D. Servan-Schreiber, Cleeremans, & 
McClelland, 1991). Specifically, the SRN has been proved to be able 
to reproduce RT learning curves in sequence learning studies even 
though it does not form chunk-like representations. The SRN is a 
connectionist network, level of activation at the output level of the 
network are considered as preparation for the next sequence event. 
To account for performance, the SRT task is viewed as a prediction task, 
and high activation levels correspond to faster reaction times (RTs). 
As Cleeremans and McClelland have shown, with training, the pattern 
of activation at the output level will more and more precisely repre- 
sent the transitional probabilities between any two sequence elements. 
To account for recognition performance, the average output activa- 
tion is computed when a small fragment sequence is presented to the 
network. In a two -alternative forced-choice (2AFC) task, the sequence 
fragment producing the more activation at the output level would be 
considered as recognized or familiar. 

A model such as PARSER by contrast does not have a direct way to 
simulate RTs, but it can easily account for performance in a recognition 
or familiarity judgment task. When trained with an artificial language, 
a random parameter between 1 and 3 determines at each time step 
the number of elements (e.g., syllables) processed simultaneously by 
PARSER and stored as a new representational unit of the perceptual 
memory. Each of these new units receives an initial weight. The weights 
of the units increase each time they are processed again or decrease on 
each processing cycle. The value of the decrement depends on the for- 
getting and interference parameters. There is a threshold above which 
a given unit shapes perception. 



In a recognition trial in a 2AFC task, the response of the model 
will depend on the units stored in the perceptual shaper. If the units 
corresponding to the two test items are both represented in the 
perceptual shaper, the response of the model will correspond to the 
unit with the strongest weight. If only one item is represented, it will 
correspond to the model's response. If none of the items is repre- 
sented, the model's choice is determined randomly (see, e.g., Giroux & 
Rey, 2009). 

In the next section, we will describe how we contrasted the 
predictions of these two models in the context of a choice RT task 
implementing the statistical regularities of an artificial language 
similar to those used by Saffran and collaborators (e.g., Saffran et al., 
1999). We did not run simulations but conducted two experiments 
in which different predictions can be made according to a chunk- 
ing model such as PARSER or a transition-finding model such 
as the SRN. 

OVERVIEW OF THE EXPERIMENTS 

To contrast the predictions of chunking and transition-finding strate- 
gies, we used a 12-choice SRT task in which the succession of the visual 
targets implemented statistical regularities similar to those found in 
artificial languages. We choose to use a visuomotor task instead of pre- 
senting the artificial language in the auditory modality in order to be 
able to track the development of statistical learning through reaction 
times (see Misyak, Christiansen, & Tomblin, 2010, for a recent similar 
attempt; see also Conway & Christiansen, 2009, for a systematic com- 
parison between the auditory and visual modalities). In our version 
of the task, participants had to learn two different artificial languages 
presented successively. In our experiments, the first "language" (LI) 
was composed of four "words", or small two-element sequences, and 
the second "language" (L2) was composed of four small three-element 
sequences. In one (control) condition, the two ensembles were not 
related to each other, but in the other (experimental) condition, the 
intra- sequences transitions of LI became inter-sequences transitions 
in L2 (see Figure 1 and Table 1). 



"table 1. 

The Four Three-Element Sequences Used During Language 1 
(L1) and Language 2 (L2) Training in the Control and 
Experimental Conditions. 





LI 


L2 


Control 


Experimental 


Control and experimental 


3-1 


3-4 


1-2-3 


6-4 


6-7 


4-5-6 


9-7 


9-10 


7-8-9 


12-10 


12-1 


10-11-12 



Note. LI differs between the control and experimental condition but 



L2 is the same in the two conditions. 
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Training 
on LI 



31 64 97 12 10 64 97 31 



Experimental 



3 4 



67 



9 10 



121 67 9 10 34 ... 



Training Control/ 
on L2 Experimental 



. 1 2 



3 4 



6 7 



9 10 



11 12 45 6 789. 



FIGURE 1. 

In the control condition, Language 1 (L1) and Language 2 (L2) are unrelated. In the experimental condition, some of L1 "intra-word" 
transitions become L2 Intra-word". 



■ 

TABLE 2. 

The Test Items Used in Experiments 1 and 2. 



Test items 


L2 sequences 


Part- L2 sequences 


Non- L2 sequences 


1-2-3 


3-7-8 


1-4-7 


4-5-6 


6-10-11 


3-11-8 


7-8-9 


9-1-2 


5-2-10 


10-11-12 


10-4-5 


6-12-9 



Note. LI = Language 1. L2 = Language 2. 



The SRN and PARSER would predict two different outcomes in this 
situation. Namely, these two models makes distinct predictions regard- 
ing the way LI learning influences L2 learning as indexed in the L2 
sequences recognition performance in the 2AFC task. However, both 
chunking and associative processes would predict faster RT within than 
between sequences as performance can be improved either because the 
transitional probability between two sequence elements is high or be- 
cause these two sequence elements are part of the same chunk. 

The probability that one sequence element follows another in the 
input stream is 100% within-words and 33% between-words (since 
there are four different sequences and no repetitions). After a sufficient 
amount of training, a system such as the SRN will learn these tran- 
sitional probabilities so that it would perfectly predict the Element 1 
when presented with the input Element 3. When switching from LI 
to L2, a transition-learner will thus have to develop new associations 
between elements in the control condition. For instance, while "3" was 
only associated with "1" in LI, the system will learn to predict "4," "7," 
or "10" after the presentation of "3" when presented with L2 because 
the sequence "1-2-3" could be followed by "4-5-6", "7-8-9" or 
"10-11-12". In the experimental condition, after training on LI, that 
comprises the sequence "3-4", the system will have simply to "tune" the 
strength of the association between "3" and "4" as it is only 33% in L2 
and not 100% as it was in LI. 



Turning now to the chunking process, recall that in PARSER, each 
element is associated with the other elements of the same chunk but 
there is no association whatsoever with the other elements. In other 
words, there is a within-chunk strength of 100% and a between-chunk 
strength of 0%. Due to the interference and forgetting parameters, the 
formed representational units will progressively vanish unless they 
are presented again in the input stream. At some point, the content of 
the perceptual memory will correspond to the largest possible chunks 
that could be extracted from the input sequence. At that point, a given 
element is included in one chunk only. For instance, if the Element 3 
is part of the chunk "3-1", it cannot be also associated with other ele- 
ments in order to form a chunk "1-2-3". As a consequence, when pre- 
sented with L2, the learning system will first have to break the chunks 
formed during training on LI in order to form the new L2 chunks. This 
task should be easier in the control than in the experimental group 
since, in the former case, LI transitions are no longer presented dur- 
ing L2. LI chunks will then progressively decay and be replaced by L2 
chunks. By contrast, in the experimental condition, LI transitions are 
still presented, although less frequently, between L2 sequences. As a 
result, LI chunks continue to be reinforced during L2 presentation. 
It will then be more difficult for a chunking system to learn L2 after LI 
in the experimental condition. If human learning is based on similar 
chunking processes, one might therefore expect better recognition of 
L2 sequences in the control than in the experimental condition. 

Another prediction concerns recognition performance with "part- 
sequences" (i.e., three-element chunks that span over a transition be- 
tween two sequences). In the test phase, participants were presented 
with three types of test items (see Table 2), three-element sequences of 
L2, "non-sequences" of L2 (three-element sequences involving transi- 
tions that were never presented in the exposition phase), and "part- 
sequences" (involving one transition that was part of a L2 sequence 
and one "between-sequence" transition of L2). If learning is based on 
transitional probabilities, participants may find it more difficult to ex- 
clude part- sequences than non-sequences as, on average, the associa- 
tion strength between sequence elements is higher in the former than 
in the latter cases. By contrast, if performance is based on chunking 
processes similar to those of PARSER, participants should exclude 
part-sequences as easily as non-sequences as, in both cases, these items 
are not part of the perceptual memory. 
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EXPERIMENT 1 

The goal of Experiment 1 was twofold. First, we wanted to make sure that 
participants could learn statistical regularities similarly as those used in 
artificial languages in the context of a SRT task. Second, we wanted to 
establish whether they will be able to recognize the L2 "words", that is, 
the three-element sequences presented in a random order during the 
SRT task. If learning is based on chunking, recognition performance 
should be the same for non-sequences and part-sequences. If perfor- 
mance is based on learning transitional probabilities, participants may 
more frequently consider part-sequences than non-sequences as L2 
sequences. The chunking hypothesis also predicts better L2 sequence- 
recognition in the control than in the experimental condition. 

Method 

PARTICIPANTS, APPARATUS, AND STIMULI 

Twelve undergraduate students (eight female and four male; 
M age = 20.9) of the Universite Libre de Bruxelles took part in the 
experiment in exchange for course credits. All reported normal or 
corrected-to-normal vision. This experiment was approved by the 
Ethics Committee of the Faculte des Sciences Psychologiques et de 
l'Education (Faculty of Psychology and Education) of the Universite 
Libre de Bruxelles. 

The experiment was run on a Mac mini computer equipped with 
a touch sensitive screen monitor. The display consisted of 12 invisible 
dots arranged in a square on the computer's screen. Each dot repre- 
sented a possible position of the visual moving target. The stimulus was 
a small red circle 0.65 cm in diameter that appeared on a gray back- 
ground, centered 0.10 cm below one of the 12 invisible dots separated 
by 2.20 cm. 

The stimulus set consisted of sequences of visual locations in 
which the visual target could occur on one out of 12 different positions 
(numbered from 1 to 12, see Table 1). In the control condition, LI con- 
tained four two-location sequences: "3-1", "6-4", "9-7", and "12-10". 
In the experimental condition, the sequences were "3-4", "6-7", "9-10", 
and "12-1". In both conditions, L2 contained four three-location se- 
quences: "1-2-3", "4-5-6", "7-8-9", and "10-11-12". The stimuli were 
presented in a pseudo-random order: a sequence was never directly 
repeated. A different mapping between the 12 sequence elements and 
the 12 screen locations was used for each participant. 

PROCEDURE 

The experiment consisted of nine training blocks during which 
participants were exposed to two different language-like sequences in a 
SRT task. In the first three training blocks, they were exposed to a first 
language (LI) composed by four two-location "words" or sequences 
(see Table 1). Each sequence was presented 200 times, for a total of 
1,600 trials. In the six subsequent blocks, participants were exposed 
to a second language (L2) composed by four three-location sequences 
presented 250 times each, for a total of 3,000 trials. L2 exceeded LI 
training in order to make sure that the second language that would 
further be tested in a 2AFC task, was learned. On each trial, a stimulus 



appeared at one of the 12 possible positions. Participants were in- 
structed to press the location of the target as fast as possible with the ad 
hoc pen. The target was removed as soon as when it had been pressed, 
and the next stimulus appeared after either a 250 ms response-stimulus 
interval (RSI) for intra- sequences transitions or a 750 ms RSI for inter- 
sequence transitions. Participants were not informed that the sequence 
of locations corresponded to the succession, in a random order, of the 
four sequences of the artificial languages. They were allowed to take 
short rest breaks between blocks. 

Participants were randomly assigned to two conditions. In the ex- 
perimental condition, one third of the inter- sequences transitions of L2 
were identical to LI sequences (see Figure 1). This was not the case in 
the control condition in which LI and L2 were unrelated. LI differed 
between control and experimental conditions whereas L2 was the same 
in both conditions. In the experimental condition, the intra-sequence 
transitions of LI became inter- sequences transitions in L2. For exam- 
ple, the L2 sequence "1-2-3" is followed by "4-5-6" in one third of 
the cases. In that case the inter- sequence transition "3-4" corresponds 
to a LI sequence. 

All participants subsequently performed a recognition task in 
which they had to decide whether they had been exposed to each 
sequence during the training phase or not. Three types of sequences 
were presented (see Table 2): the sequences from L2 (each sequence 
presented twice); four part-sequences, that is, sequences composed by 
the end of a sequence and the beginning of another sequence; and four 
non-sequences, corresponding to visual sequences which had never 
been presented during L2 training. In total, the experiment lasted ap- 
proximately 50 min. 

Results 

REACTION TIME RESULTS 

To assess whether participants were able to learn LI and L2, we ex- 
amined separately mean RTs for the first three blocks (LI) and for the 
next six blocks (L2) in the control and experimental conditions. Recall 
that the stimulus material was such that the first element of each se- 
quence was unpredictable, whereas the second element (and third ele- 
ment in L2) were completely predictable. Figure 2 (left panel) shows the 
average RTs obtained over the entire experiment, plotted separately for 
each element of the sequences. Given that participants performed simi- 
larly in the control and in the experimental conditions, F(l, 10) = 2.1 13, 
p > .1, for LI; and F(l, 10) = 0.481, p > .5, for L2, we pooled them to- 
gether. It appeared that participants' responses were strongly influenced 
by the serial position within each sequence: RTs decreased more and 
were faster for predictable elements than for unpredictable elements 
(cf. Figure 2). Two two-way analyses of variance (ANOVA) conducted 
on mean RTs confirmed these impressions. First, we examined the 
first three blocks (LI) by using an ANOVA with Block (3 levels) and 
Element (2 levels - predictable and unpredictable) as repeated mea- 
sures factors. This analysis revealed a significant main effect of block, 
F(2,10) = 56.007,p<.0001,partialn 2 =.804;andelement,F(l,10) = 15.431, 
p < .005, partial if = .520. The interaction also reached significance, 
F(2, 10) = 6.630, p < .01, partial if = .399. Second, we examined the 
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next six blocks (L2) by using an ANOVA with Block (6 levels) and 
Element (3 levels) as repeated measures factors. A significant main ef- 
fect of block was found, F(5, 50)= 15.113, p < .0001, partial if= .592. 
The analysis also revealed a significant main effect of element, 
F{2, 20) = 25.141, p < .0001, partial if = .707. The interaction also 
reached significance, F(10, 100) = 6.220, p < .0001, partial if = .377. 

RECOGNITION TASK RESULTS 

Figure 2 (right panel) shows recognition performance for the three 
types of test sequences plotted separately for control and experimen- 
tal conditions. Inspection of the figure indicates that the participants 
recognized L2 sequences, non-sequences, and part-sequences in the 
two conditions. 

In order to ensure that the sensitivity of the recognition measure 
is independent of response bias, that is, is not affected by the partici- 
pants' own report criteria, we used the signal detection theory in the 
same way as in Tunney and Shanks (2003). For each participant, we 
computed a d' value reflecting the ability to discriminate between old 
and new sequences. Hits correspond to "yes" responses to old triplets 
- (correct responses), and false alarms correspond to "yes" responses 

■ 

TABLE 3. 

f Values Comparing Recognition Scores to Chance Level in 
Control and Experimental Conditions for the Three Types of Test 
Sequences. 





Words 


Non-words 


Part-words 


Control 


5.82* 


2.91* 


3.79* 


Experimental 


2.89* 


2.44* 


2.15* 



p < .05 (one-tailed). 



to new sequences - (incorrect responses). A one sample f-test on the 
d' distribution indicates that, on average, participants were able to dis- 
criminate between old and new triplets, mean d' = 3.022, £(11) = 4.567 
p = .001. Performance was above chance level in both conditions and 
for each type of test triplet as confirmed by a series of one-tailed f-tests 
(see Table 3). 

More importantly, performance was reliably better for L2 sequen- 
ces in the control condition as compared to the experimental condi- 
tion, f(47) = 1.70, p < .05 (one-tailed). All the other comparisons were 
not significant. 

Discussion 

Our SRT results indicate that participants learned LI and L2 in both 
experimental and control conditions. The recognition results showed 
that participants were able to discriminate the sequences of L2. 
Importantly, performance was improved in the control condition as 
compared to the experimental condition, that is, when the two lan- 
guage-like sequences did not share any transitions between elements. 
Taken together, these results are in line with the notion that partici- 
pants learned the sequences based on parsing mechanisms. 

Recall that, in the experimental condition, LI transitions (e.g., 
"3-4") were still presented between sequences during L2 presentation 
(e.g., between "1-2-3" and "4-5-6"). As a result, LI chunks continue 
to be reinforced during L2 presentation. As a consequence, a chunking 
model, such as PARSER for instance, would predict better L2 recogni- 
tion in the control than in the experimental condition. Indeed, it should 
be more difficult for such a model to develop new representations for 
the new L2 sequences if the previous, conflicting representations deve- 
loped during LI were still reinforced. 

The observation that non-sequences and part-sequences rejection 
did not differ between the two conditions also fits with the prediction 
of a chunking model. The representational units that result from learn- 
ing in such a model do not reflect the actual transitional probabilities 
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FIGURE 2. 

The figure shows mean reaction times (RTs) obtained for unpredictable (Element 1) and predictable elements (Elements 2 and 3) 
during Language 1 (L1) and Language 2 (L2) blocks. RTs are averaged over experimental and control conditions (left panel). Mean 
percentage of correct responses during the recognition task for words, non-words, and part-words in the control and experimental 
conditions are displayed on the right panel. Chance level = 50%. 
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present in the training sequence. The probability to erroneously con- 
sider a test sequence as a sequence of L2 should then not be higher for 
part-sequences than for non-sequences even though the transitional 
probabilities are, on average, higher in the former cases. 

In Experiment 1, however, sequences were clearly identified by 
the use of a larger RSI for inter- sequences than for intra-sequence 
transitions. It remains therefore possible that our results depend on 
this particular presentation mode. In other words, learning would fit 
with chunking models simply because the input stream was already 
parsed into consistent chunks. To address this possibility, we con- 
ducted a second experiment in which the RSI was set to a constant 
value. 

EXPERIMENT 2 

Participants, apparatus, stimuli, 
and procedure 

Ten undergraduate students (six female, four male; M age = 21.3) of the 
Universite Libre de Bruxelles took part in the experiment in exchange 
for course credits. All reported normal or corrected-to -normal vision. 
This experiment was approved by the Ethics Committee of the Faculte 
des Sciences Psychologiques et de l'Education (Faculty of Psychology 
and Education) of the Universite Libre de Bruxelles. 

The apparatus and display were identical to those used in Experi- 
ment 1. The procedure was identical to the one used in Experiment 1 
except for the fact that the RSI was fixed at 250 ms for intra- 
sequence and inter- sequence transitions. The stimuli were identi- 
cal to those used in Experiment 1. A different mapping was also 
used for each participant in this experiment even though a given 
sequence element was associated with only 10 out of the 12 possible 
screen locations. 



Results 

REACTION TIME RESULTS 

Figure 3 (left panel) shows the average RTs obtained over the entire 
experiment, plotted separately for each element of the sequences. As 
in Experiment 1, control and experimental conditions were pooled 
together since there was no difference in performance between both 
conditions, F(l, 8) = 1.114, p >.l, for LI; and F(l, 8) = 0.042, p > .5, 
for L2. The results clearly indicate that RTs are strongly influenced by 
the position: RTs decreased more and were faster for predictable ele- 
ments than for unpredictable elements. 

Two two-way ANOVA conducted on mean RTs confirmed these 
impressions. First, we examined the first three blocks (LI) by using 
an ANOVA with Block (3 levels) and Element (2 levels - predictable 
and unpredictable) as repeated measures factors. This analysis re- 
vealed a significant main effect of block, F(2, 16) = 37.227, p < .0001, 
partial if = .807; and of element, F(l, 8) = 9.720, p < .05, partial 
rf= .525. The interaction also reached significance, F(2, 16) = 7.337, 
p < .005, partial rf = .474. Second, we examined the next six blocks (L2) 
by using an ANOVA with Block (6 levels) and Element (3 levels) as 
repeated measures factors. We found a significant main effect of block, 
F(5, 40) = 9.657, p < .005, partial n 2 = .490. The analysis also revealed 
a main effect of element, F(2, 16) = 8.404, p < .005, partial if = .472. 
The interaction also reached significance, F(10, 80) = 6.914, p < .0001, 
partial if = .437. 

RECOGNITION TASK RESULTS 

To analyze recognition performance, we first computed a d' 
value as in Experiment 1. A one-sample t test on the d' distribution 
showed that participants were able to discriminate between old and 
new sequences, mean d' = 1.150, t(9) = 3.035, p = .014. As indicated 
in Table 4, participants were able to correctly reject non-sequences 
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FIGURE 3. 

The figure shows mean reaction times (RTs) obtained for unpredictable (Element 1) and predictable elements (Elements 2 and 3) 
during Language 1 (L1) and Language 2 (L2) blocks. RTs are average over experimental and control conditions (left panel). Mean 
percentage of correct responses during the recognition task for words, non-words, and part-words in the control and experimental 
conditions are displayed on the right panel. Chance level = 50%. 
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"TABLE 4. 

t Values Comparing Recognition Scores to Chance Level in 
Control and Experimental Conditions for the Three Types of Test 
Sequences. 





Words 


Non-words 


Part-words 


Control 


1.24 


3.21* 


1.63* 


Experimental 


2.36* 


3.17* 


0.41* 



p < .05 (one-tailed). 



in both conditions. They did not, however, correctly reject part- 
sequences. Concerning L2 sequences, experimental participants 
recognized them above chance but this was not the case in control 
participants. 

Second, we analyzed the proportions of correct recognitions, 
plotted in Figure 3 (right panel), measured in the two conditions and 
for the three types of test sequences. Overall, performance did not 
significantly differ between control and experimental conditions (for 
all differences p > .05). Therefore, we pooled control and experimental 
conditions together and compared performance for non-sequences, 
part-sequences, and L2 sequences. This analysis revealed a signifi- 
cant difference between non-sequences and part-sequences, paired 
£(78) = 1.574, p < .05. Non-sequences were reliably more correctly 
rejected than part-sequences (see Figure 3, right panel). The other 
comparisons failed to reach significance. 

Discussion 

In Experiment 2, LI and L2 were presented using a constant RSI. 
As in Experiment 1, participants learned the first and second languages. 
Indeed, throughout training, mean RTs decreased more for predictable 
than for unpredictable elements. Moreover, participants recognized 
L2 sequences, at least in the experimental condition, and correctly re- 
jected non-sequences. Interestingly, in both experimental and control 
conditions, participants performed better in rejecting non-sequences 
than part-sequences, which were not correctly rejected. 

According to PARSER, performance should be the same for non- 
sequences and part-sequences. If participants formed L2 chunks during 
training, it should be as easy to reject non-sequences as part- sequences 
as these sequences do not match the units formed during training. 
On the contrary, the SRN predicts that participants should recognize 
L2 sequences, which correspond to high transitional probabilities, 
and reject non-sequences, which correspond to low transitional pro- 
babilities. However, as part-sequences involved high transitional pro- 
babilities, the SRN may have more difficulties in rejecting them. The 
results of Experiment 2 nicely fit with the SRN predictions, suggesting 
that participants are indeed sensitive to the actual values of the tran- 
sitional probabilities between sequence elements. When considering 
Experiments 1 and 2 together, our results suggest that the values of 
transitional probabilities influence performance when temporal cues 
do not guide the chunking process. 



GENERAL DISCUSSION 

In this paper, we aimed at clarifying the nature of the representations 
involved in implicit and statistical learning. The question was to assess 
whether participants form chunks of the training material or merely 
develop a sensitivity to the transitional probabilities present in the 
training sequence. In line with previous studies showing that statistical 
learning of pseudolinguistic regularities can occur in other modalities 
than the auditory modality, we showed, in the context of a visuo-motor 
RT task, that participants learn the statistical regularities present in a 
random succession of sequences of visual targets. The RT results indi- 
cate that participants were able to learn two different languages (LI and 
L2) presented successively. Moreover, they were also able to recognize 
L2 sequences in a subsequent recognition task. 

When sequences were clearly separated from each other in 
Experiment 1, recognition performance was improved in a control 
condition in which LI and L2 did not share any pairwise transitions 
between sequence elements. These results are in line with the notion 
that word-like, rigid, disjunctive units are developed during learning. 
However, chunk formation seems not to be automatic in our task. 
When sequences were not clearly identified in Experiment 2, that is, 
when they were presented in a continuous stream without any temporal 
cue to guide the chunking process, recognition performance was more 
strongly affected by the actual values of the transitional probabilities 
between sequence elements. This was reflected in Experiment 2 by bet- 
ter rejection of non-sequences than part-sequences in the recognition 
task. This pattern of results is in line with previous studies showing that 
the temporal distribution of the input affects statistical learning in the 
visual modality (Conway & Christiansen, 2009). 

How are these findings related to natural languages segmentation? 
A large body of evidence indicates that, in the absence of a clear word- 
boundary cue in the signal, word segmentation in natural language 
is based on lexical, sublexical, phonetic, phonotactic, and prosodic 
cues (Mattys, Jusczyk, Luce, & Morgan, 1999). Research on natural 
speech indicates that lower level, signal- contingent cues will be more 
prone to influence segmentation when the availability of higher level 
lexical information decreases (Mattys, White, & Melhorn, 2005). In the 
same way, our results suggest that in the absence of a clear temporal 
cue, recognition performance is more affected by the strength of the 
transitional probabilities as it is the only available cue to find between- 
sequences boundaries. 

In line with a modality- constrained view of implicit statistical 
learning (Conway & Pisoni, 2008), previous studies have shown that 
statistical learning was differentially affected by training conditions 
in the auditory and visual modalities (Conway & Christiansen, 2005, 
2009; Saffran, 2002). The rate of presentation of the input stream, for 
instance, influences more the statistical learning of sequential regulari- 
ties in the visual than in the auditory modality. Our results also sug- 
gest that the nature of the statistical learning processes involved in our 
visuo-motor task could be modulated by the rate of presentation of the 
sequence of visual targets. Participants chunk the sequence according 
to the "words" of the artificial language when they are clearly marked 
by the temporal structure of the input, not otherwise. 
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Another possible explanation for this result could be that partici- 
pants did form chunks in Experiment 2 but not those that corresponded 
to the actual L2 sequences. It is possible that participants indeed parsed 
the continuous sequence of visual stimuli into smaller chunks but that 
these chunks did not respect the actual boundaries between L2 se- 
quences. It is also not necessarily the case that all sequential transitions 
end up as being part of a larger chunk. Participants may have focused, 
for instance, on particularly salient transitions (e.g., between elements 
that were spatially close to each other or between alternating locations) 
and end up with larger, smaller, or different chunks than those cor- 
responding to the sequences of the artificial language. In other words, 
if chunking is not directly induced by the presentation mode, atten- 
tional factors may also influence chunk formation. As a consequence, 
the actual chunks may differ from one participant to another and may 
not strictly reflect the transitional probabilities between the different 
sequence elements. This may, of course, influence recognition perfor- 
mance as a different parsing from one participant to another would 
tend to cancel each other out. 

Both the SRN and PARSER implement elementary associative 
learning mechanisms such that, in both cases, the system tends to as- 
sociate elements that occur often in succession. As a consequence, even 
if the chunks resulting from training do not correspond to the actual 
sequences of the artificial language, there is a good chance that they 
involve highly frequent transitions. Participants may therefore tend to 
erroneously consider these part-sequences as sequences of the artificial 
language because they involve such high-frequency transitions. 

It remains therefore possible that participants were not sufficiently 
trained on L2 in Experiment 2 in order to form the correct chunks 
of the second language. Recognition performance would then reflect 
intermediate chunk formation and these intermediate representations 
necessarily correspond more to part-sequences than to the never pre- 
sented non-sequences. 

Even though we cannot strictly exclude the possibility that chunks 
were formed in Experiment 2, recent sequence learning results suggest 
however that chunking does not take place when a cue inducing spe- 
cific segmentation is removed. Jimenez, Mendez, Pasquali, Abrahamse, 
and Verwey (2011) also addressed the notion that chunking could be 
the main learning mechanism underlying sequence learning. They 
proposed a new index to capture segmentation in learning, based 
on the variance of responding to different parts of a sequence. They 
reasoned that discontinuous performance (indicating chunking pro- 
cesses) could be revealed through the observation of an increase in 
RT variance. Indeed, as participants should respond faster to sequence 
elements within- than between-chunks, RT variance should increase 
over training if learning is based on a growing number of chunks. As 
predicted, Jimenez et al. observed that participants who were induced 
to parse the sequence in a uniform way by using color cues responded 
much faster to the trials internal to a chunk than to those correspond- 
ing to the transition between successive chunks. By contrast, when 
the color cues were removed in a transfer phase, they did not respond 
faster to within-chunk transitions anymore. As a matter of fact, they 
did not respond differently from control participants who were not 



trained to chunk by colors beforehand. In line with our study, Jimenez 
et al. concluded that chunk learning arises when induced by a salient 
cue (the RSI in our study, the color of the stimulus in Jimenez et al.) 
while statistical learning of transitional probabilities occurs in more 
implicit settings, and in the absence of such salient cues. 

Another central debate in the literature concerns the degree to 
which statistical learning depends upon specialized processes, devoted 
to the purpose of finding word-boundaries (domain-specific proces- 
ses) or whether it is based on domain-general mechanisms dedicated 
to statistical computations. This issue is mostly discussed in the field of 
developmental psychology where the question is to know whether in- 
fants' cognition is best viewed as the deployment of innate skills (Carey, 
1999) or whether more weight should be put on the potential role of 
environmental structure in guiding development (Kirkham, Slemmer, 
& Johnson, 2002). In this view, the initial state would be better charac- 
terized by domain-general mechanisms that would adapt themselves 
to different types of input in different modalities (Dawson & Gerken, 
2009; Karmiloff- Smith, 1992). That question is clearly beyond the 
scope of this study but our results suggest that learning can indeed be 
"tuned" to the inputs properties. 

Finally, as mentioned before, statistical learning has been initially 
demonstrated in infants presented with a continuous stream of syl- 
lables (Saffran et al., 1996). Previous studies have also shown that in- 
fants and children were able to learn the statistical regularities present 
in a sequence of movements of a visual object (Kirkham, Slemmer, 
Richardson, & Johnson, 2007) or within a sequence of different visual 
shapes (Kirkham, Slemmer, & Johnson, 2002). Future work is needed, 
however, in order to measure whether the representational format of 
the acquired knowledge may also differ in infancy depending on train- 
ing conditions. 

In summary, this study suggests that when units are marked by a 
temporal cue, the chunking models provide reliable assumptions con- 
cerning the nature of the representations developed during learning. 
However, in the absence of cues guiding the chunking processes, per- 
formance appears to reflect the sensitivity to the strength of the tran- 
sitional probabilities. This study suggests that prediction-based and 
clustering processes are not necessarily mutually exclusive but could 
be differentially associated with performance depending on training 
conditions. 
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